200 research outputs found
VLSI Architecture and Design
Integrated circuit technology is rapidly approaching a state where feature sizes of one micron or less are tractable. Chip sizes are increasing slowly. These two developments result in considerably increased complexity in chip design. The physical characteristics of integrated circuit technology are also changing. The cost of communication will be dominating making new architectures and algorithms both feasible and desirable. A large
number of processors on a single chip will be possible. The cost of communication will make
designs enforcing locality superior to other types of designs.
Scaling down feature sizes results in increase of the delay that wires introduce. The delay even of metal wires will become significant. Time tends to be a local property which will make the design of globally synchronous systems more difficult. Self-timed systems will eventually become a necessity.
With the chip complexity measured in terms of logic devices increasing by more than an order of magnitude over the next few years the importance of efficient design methodologies and tools become crucial. Hierarchical and structured design are ways of dealing with the complexity of chip design. Structered design focuses on the information
flow and enforces a high degree of regularity. Both hierarchical and structured design encourage the use of cell libraries. The geometry of the cells in such libraries should be parameterized so that for instance cells can adjust there size to neighboring cells and make the proper interconnection. Cells with this quality can be used as a basis for "Silicon Compilers"
A mathematical approach to modelling the flow of data and control in computational networks
This paper proposes a mathematical formalism for the synthesis and qualitative analysis of computational networks that treats data and control in the same manner. Expressions in this notation are given a direct interpretation in the implementation domain. Topology,
broadcasting, pipelining, and similar properties of implementations can be determined directly from the expressions.
This treatment of computational networks emphasizes the space/time tradeoff of implementations. A full instantiation in space of most computational problems is unrealistic, even in VLSI (Finnegan [4]). Therefore, computations also have to be at least partially
instantiated in the time domain, requiring the use of explicit control mechanisms, which typically cause the data flow to be nonstationary and sometimes turbulent
Concurrent Algorithms for the Conjugate Gradient Method
A few concurrent algorithms for the basic conjugate gradient method
is devised and discussed. Most of the algorithms have a topology that
is naturally determined by characteristic dimensions of the system and
the operations of each step of the conjugate gradient method. The
topologies map well onto buildable structures of sparsely interconnected
processors while preserving unit communication distance. The topology
of the algorithms are:
1) A binary tree
2) A composition of a binary tree and a ring the nodes of
which forms the leaves of the tree.
3 ) A linear array with some additional processing elements.
It is also discussed how these algorithms maps onto Boolean n-cubes.
The algorithms all have the property that a communication operation
is associated with each computation.
No claim is made as to the optimality from a space-time complexity
point of the algorithms presented here. However, the processor
utilization for some algorithms and topologies are close to 100% and the
space*time complexity of those algorithms are of the same order as the
arithmetic complexity of common sequential machine algorithms
A Computational Array for the QR-Method
The QR-method is a method for the solution of linear system of equations. The matrix R is upper triangular and Q is a unitary matrix. In equation solving Q is not always computed explicitly. The matrix R can be obtained by applying a sequence of unitary transformations to the matrix defining the system of equations. Householder's method or Given's method can be used to determine
unitary transformation matrices. This paper describes a concurrent algorithm and corresponding array for computing the triangular matrix R by Householder transformations. Particular attention is given to issues such as broadcasting
and pipelining
Pipelined linear equation solvers and VLSI
Many of the commonly used methods for solution of linear systems of equations on sequential machines can be given a concurrent formulation. The concurrent algorithms take advantage of independence of operations in order to reduce the time complexity of the methods. During the course of computations specified by the algorithm data has to be routed to the various places of computation. Pipelining
can be used to avoid broadcasting in VLSI arrays for computation. Pipelining will in general allow for a reduced cycle time but may force data to be spread out in
time, as is the case for Gaussian elimination. What the required spacing is depends on the pipelining and the data flow.
In the paper concurrent algorithms and their pipelining for Gaussian elimination, Householder transformations and Given's rotations are discussed, Gaussian elimination and Given's rotations can use two dimensional arrays while Householder transformation uses a one dimensional array. If partial pivoting is necessary in Gaussian elimination, then one dimension of the array is essentially lost and s
linear array is almost as efficient as a two-dimensional array. Householder transformations that are numerically stable may perform the triangulation in shorter time, if partial pivoting is necessary in Gaussian elimination. The amount of arithmetic that a node in the arrays perform is somewhat different for the different methods. The difference is largest for the boundary cells. However, it
should be feasible to design a common node of very low complexity that very efficiently supports a range of methods for the solution of linear systems of
equations
- …